Cache Pipelining with Partial Operand Knowledge

نویسندگان

  • Erika Gunadi
  • Mikko H. Lipasti
چکیده

Caches consume a significant amount of power in modern microprocessors while also constraining clock frequency due to their access time. In this paper, we propose a bit-sliced cache, which reduces dynamic power consumption and achieves higher clock frequency as well as increased cache throughput while adding little complexity. Our bit-sliced cache reduces 20-40% of dynamic power for a variety of cache organizations by activating only the necessary row decoders and subarrays. To reduce cycle time, the cache access is pipelined, which results in higher bandwidth without suffering from the complexity and power and area penalty caused by an additional cache port. We report cycle time improvements nearly proportional to the degree of bit-slice pipelining, as well as performance improvements averaging 9% and 11% for an out-of-order processor with a 2-sliced and 4-sliced cache and ALU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multipldpass Pipelining: Enhancing In-order Microarchitectures to Out-of-order Performance

Out-of-program-order execution has become almost a ubiquitous characteristic of modern processors because of its ability to tolerate variable memory-instruction latency. As designs are becoming increasingly power-conscious, the cost and complexity of the components of out-of-order execution are becoming problematic. Compilers have generally proven adept at planning useful static instruction-lev...

متن کامل

One Address Computers are Faster and Use Less Memory Space to Execute Arithmetic Assignment Statements

A notation is developed which permits space and time efficiemcy comparisons of four basic computer architectures in use today for executing Fortran-style assignment statements. From the com~arisona, we discover that a suitably designed l-address archit.cture (one accumulator machine) outperforms toe other architectures in speed of execution and in encoded size ~ compiled Fortran statements. The...

متن کامل

Stallscope: Illuminating the Black Box

As microprocessors become increasingly more complex, cycle-accurate simulation has become a valuable tool for performance analysis and microarchitectural exploration. However, parallelism, complex interdependencies, and deep pipelining in modern superscalar processors make it difficult to identify how a particular microarchitectural design feature ultimately affects performance, particularly in...

متن کامل

A New Multiplier Using Wallace Structure and Carry Select Adder with Pipelining

Design of a high performance and high-density multiplier is presented. This multiplier is constructed by using the Wallace tree structure with pipelining. A fast carry select adder is used for the final two-operand adder. It is shown that the time delay for the entire multiplier is O(log(n)). The design is particularly carried out for a 32-bit multiplier with two sections of pipelining, to bala...

متن کامل

High Throughput Power-Aware FIR Filter Design Based on Fine-Grain Pipelining Multipliers and Adders

In regular FIR structure, by pipelining the multipliers one can improve the throughput. But as the growth of operand word length, the delay in addition process becomes another important constraint. In this paper, a novel fine-grain pipelining scheme for high throughput FIR is proposed. By pipelining multipliers and adders, very high throughput can be achieved. 2-Dimensional pipeline gating tech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004